Predicting hospitalisation for heart failure and death in patients with, or at risk of, heart failure before first hospitalisation: a retrospective model development and external validation study

Summary Background Identifying people who are at risk of being admitted to hospital (hospitalised) for heart failure and death, and particularly those who have not previously been hospitalised for heart failure, is a priority. We aimed to develop and externally validate a prognostic model involving contemporary deep phenotyping that can be used to generate individual risk estimates of hospitalisation for heart failure or all-cause mortality in patients with, or at risk of, heart failure, but who have not previously been hospitalised for heart failure. Methods Between June 1, 2016, and May 31, 2018, 3019 consecutive adult patients (aged ≥16 years) undergoing cardiac magnetic resonance (CMR) at Manchester University National Health Service Foundation Trust, Manchester, UK, were prospectively recruited into a model development cohort. Candidate predictor variables were selected according to clinical practice and literature review. Cox proportional hazards modelling was used to develop a prognostic model. The final model was validated in an external cohort of 1242 consecutive adult patients undergoing CMR at the University of Pittsburgh Medical Center Cardiovascular Magnetic Resonance Center, Pittsburgh, PA, USA, between June 1, 2010, and March 25, 2016. Exclusion criteria for both cohorts included previous hospitalisation for heart failure. Our study outcome was a composite of first hospitalisation for heart failure or all-cause mortality after CMR. Model performance was evaluated in both cohorts by discrimination (Harrell's C-index) and calibration (assessed graphically). Findings Median follow-up durations were 1118 days (IQR 950–1324) for the development cohort and 2117 days (1685–2446) for the validation cohort. The composite outcome occurred in 225 (7·5%) of 3019 patients in the development cohort and in 219 (17·6%) of 1242 patients in the validation cohort. The final, externally validated, parsimonious, multivariable model comprised the predictors: age, diabetes, chronic obstructive pulmonary disease, N-terminal pro-B-type natriuretic peptide, and the CMR variables, global longitudinal strain, myocardial infarction, and myocardial extracellular volume. The median optimism-adjusted C-index for the externally validated model across 20 imputed model development datasets was 0·805 (95% CI 0·793–0·829) in the development cohort and 0·793 (0·766–0·820) in the external validation cohort. Model calibration was excellent across the full risk profile. A risk calculator that provides an estimated risk of hospitalisation for heart failure or all-cause mortality at 3 years after CMR for individual patients was generated. Interpretation We developed and externally validated a risk prediction model that provides accurate, individualised estimates of the risk of hospitalisation for heart failure and all-cause mortality in patients with, or at risk of, heart failure, before first hospitalisation. It could be used to direct intensified therapy and closer follow-up to those at increased risk. Funding The UK National Institute for Health Research, Guerbet Laboratories, and Roche Diagnostics International.


Supplemental results
Prognostic modelling that considered left ventricular ejection fraction, rather than global longitudinal strain, as a candidate predictor Models considering fractional polynomial transformations of continuous covariates showed that linear transformations were most frequently selected across the 20 imputed datasets, excepting NT-proBNP, which was most frequently fit with natural logarithmic transformation (Supplemental Table 11), which is in keeping with previous studies. 1 The parsimonious multivariable model included age, diabetes, COPD, Ln (NT-proBNP), LV ejection fraction, right ventricle ejection fraction, body surface area-indexed left atrial area, myocardial infarction and myocardial ECV (Supplemental Table 12A and B).

Model performance -internal validation
Variable selection was consistent across imputed datasets. QRS duration was present as an additional variable in the model in 18 of the 20 datasets, however nested multivariable Wald tests pooled across the 20 datasets showed that QRS duration did not significantly contribute to model performance (p = 0·073).
The multivariable model satisfied the proportional hazards assumption, evaluated using Schöenfeld residual testing, pooled with the D2 method. 2,3 Univariably, LV ejection fraction did demonstrate some evidence of an association with time (Supplemental Table 13), however plots of the scaled Schöenfeld residuals against time in the 20 imputed datasets demonstrated that this relationship was negligible (Supplemental Figure 8). Alternative models allowing penalised spline fits for continuous covariates satisfied the proportional hazards assumption at univariable and multivariable levels (Supplemental Table 14), however model improvement was negligible. 4 In light of this, and to facilitate future clinical utility of the model, the linear expression of global longitudinal strain was preserved.
The median optimism-adjusted C-index for the parsimonious model across the 20 imputed datasets was 0·810 (95% CI 0·796 -0·833) (Supplemental Table 15). The 3-year calibration plot demonstrating the predictive accuracy of the model is presented in Supplemental Figure 9. As is observed from the plot, model calibration was high across the full range of predicted risk, with the calibration curve lying on or close to the reference line throughout. The calibration slope (0·928, Supplemental Table 15), ICI (0·002) and E90 (0·003) all indicated excellent calibration.

Model performance -external validation
The model was re-derived in the development cohort using the matched pool of candidate predictor variables. Variable selection was consistent across the 20 imputed datasets, and between the models derived from the original candidate predictor variables and the matched pool. Indeed, the only differences in model construction were that indexed left atrial area and right ventricular ejection fraction, which were not available in the validation cohort, were not in the model. Schöenfeld residual testing demonstrated that the proportional hazards assumption was met, except, as before, for LV ejection fraction, which univariably, again demonstrated an association with time (Supplemental Table  18). However, once again, further investigation demonstrated that this association was minimal (Supplemental Figure 10), and that alternative model fitting led to negligible model improvement (Supplemental Table 19), thus the linear expression was preserved.
The median optimism-adjusted C-index for the externally validated model across the 20 imputed model development datasets was 0·809 (95% CI 0·797 -0·832), and the calibration slope, adjusted for model fitting optimism, was 0·943 (Supplemental Table 20). The optimism-adjusted C-index for the model in the validation cohort was 0·787 (95% CI 0·760 -0·814). (Unadjusted discrimination was similar). The 1-and 3-year optimism-adjusted calibration plots, updated to account for the different risk profile of the validation cohort, demonstrate that the model achieves good calibration (Supplemental Figure 11; calibration plots prior to baseline hazard updating are presented in Supplemental Figures 12 and 13). Kaplan-Meier curves also demonstrate good model performance (Supplemental Figure 14).

Supplemental tables and figures
Abbreviations per previous tables.

Supplemental
Abbreviations per previous tables.

Supplemental Figure 2. Scaled Schöenfeld residual plots for global longitudinal strain (GLS) against time in the 20 imputed datasets
Supplemental Supplemental Figure 3. Calibration plot of bootstrap resampling estimates of predicted probability of hospitalisation for heart failure or all-cause mortality at 3-years versus observed probabilities using the flexible hazard regression approach 5,6 The smooth black line is the apparent calibration, and the blue line is the bootstrap optimism-(overfitting-) corrected calibration curve, both estimated by adaptive linear spline hazard regression. The grey line is the line of identity and represents perfect calibration. Mean |error| is equivalent to the ICI and 0.9 quantile is equivalent to E90. 7 A rug plot of the distribution of predicted outcome probabilities sits on the top axis of the plot. Survival is survival free of hospitalisation for heart failure or all-cause mortality.

Supplemental
Abbreviations per previous tables.

Supplemental Figure 5. Scaled Schöenfeld residual plots for global longitudinal strain (GLS) against time in the 20 imputed datasets for the externally validated model
Supplemental Abbreviations per previous tables.

Supplemental Figure 8. Schöenfeld residual plots for LV ejection fraction against time in the 20 imputed datasets
Supplemental Supplemental Figure 9. Calibration plot of bootstrap resampling estimates of predicted probability of hospitalisation for heart failure or all-cause mortality at 3-years versus observed probabilities using the flexible hazard regression approach 5,6 The smooth black line is the apparent calibration, and the blue line is the bootstrap optimism-(overfitting-) corrected calibration curve, both estimated by adaptive linear spline hazard regression. The grey line is the line of identity and represents perfect calibration. Mean |error| is equivalent to the ICI and 0.9 quantile is equivalent to E90. 7 A rug plot of the distribution of predicted outcome probabilities sits on the top axis of the plot. Survival is survival free of hospitalisation for heart failure or all-cause mortality. Supplemental Table 21. Preliminary analysis comparing model discrimination when heart failure was the primary diagnosis versus when heart failure was the primary or secondary diagnosis Outcome (hospitalisation for heart failure or all-cause mortality)

Further analyses
The following results are not described in the main manuscript but are included for additional information.
Model performance by heart failure stage A preliminary evaluation was conducted to compare model discrimination in subgroups of patients with stage A and B vs C and D heart failure in the derivation cohort, using a complete case analysis (Supplemental Table 23). Model discrimination appears high in both groups, albeit marginally higher in stage A and B.